Olink_R_analysis.qmd

Author

Wei Zhu

Published

December 17, 2025


Introduction

2. Core Terminology & Data Structure

  • Projects: A set of plates that run at the same time and have been normalized together. If two projects are not randomized or are run at different times then additional normalization is required. A project encapsulates all metadata and run results for a specific study. Each project consists of one or more plates. NPX_Manual.pdf.
  • Plates: The physical processing units, typically utilizing a standard 96-well format.
  • Assays: Individual antibody-based tests designed for specific protein targets. The Olink Explore HT system, for example, features 5,420 protein biomarkers (assays) organized into 8 blocks. Each block includes internal controls: one incubation control, one extension control, and one amplification control. This results in a total of 5,444 assays per sample.
  • Normalization: The conversion of raw NGS counts into NPX (Normalized Protein Expression), a relative \(log_2\) scale.
    • Plate Control (PC) Normalization: PC normalization is the standard “baseline” normalization. It uses internal controls (Plate Controls) included in every run to account for technical variation between different plates. \[NPX_{i,j} = ExtNPX_{i,j} - \text{median}(ExtNPX_{i, \text{Plate Controls}})\]
      • Note

        More generally, when the Plate Controls in a dataset differ from the reference Plate Control lot used by the analysis pipeline, an internal Plate Control Lot Factor can be applied to Plate Control extNPX values to align them to that reference.

        \[ExtNPX_{i, PC} (\text{adjusted}) = ExtNPX_{i, PC} (\text{raw}) + \text{PC Lot Factor}_i\] Therefore, the actual PC normalization formula is: \[NPX_{i,j} = ExtNPX_{i,j} - \text{median}(ExtNPX_{i, \text{Plate Controls}}) - \text{PC Lot Factor}_i\]

    • Intensity Normalization: Intensity normalization is a “global” adjustment that uses the actual biological samples to align plates. It is designed to further reduce technical noise and increase statistical power. \[NPX_{i,j} = ExtNPX_{i,j} - \text{median}(ExtNPX_{i, \text{Samples}})\]
      • Normalize by median of samples (excluding control strip).

3. Control Systems & Quality Control (QC)

The platform relies on a sophisticated hierarchy of controls to ensure data quality ExploreHT_QC.pdf.

Internal and external controls

The QC workflow

Figure 2: Olink QC workflow

Plate QC

  • Sample QC: Samples and external controls that fail Sample QC will not be considered for additional QC steps and not normalized. Only counts will be reported for those.
  • Assay QC: Detection of high number of counts for any assay, relative to the internal controls, in any of the Negative Controls is considered as unexpected signal. This step is performed on Negative Controls that pass Sample QC.

4. Platform Reliability: The “CV Gap”

There is a documented discrepancy between manufacturer-reported reliability and independent study results.

Third-Party Findings (Rooney et al., 2025)

  • Independent evaluation using the ARIC cohort (102 split samples) reported lower precision Rooney2025_ARIC.pdf:
  • Median CV: 35.7% for the whole Explore HT panel and 17.6% after excluding values < LOD.
  • Conclusion: High variation is often driven by the large number of assays residing near the technical noise floor in clinical samples.

5. Handling the Limit of Detection (LOD)

The LOD is the threshold where the protein signal is statistically distinguishable from the Negative Control background.

LOD and Data Quality

Reliability is strongly tied to the signal-to-noise ratio Rooney2025_ARIC.pdf: * Precision is inversely correlated with the percentage of samples above LOD (\(r = -0.77\)). * Assays where \(NPX < LOD\) are dominated by technical noise, leading to artificially inflated CVs.

Best Practices

  1. Filtering for Validation: When calculating IntraCV or InterCV, exclude data points where \(NPX < LOD\).
  2. Imputation for Analysis: For biological discovery, Olink recommends original NPX. However, some researchers replace values below LOD with \(LOD/2\) to stabilize correlation analysis.
  3. Reporting: Always report the “Percent Above LOD” for every assay as a primary quality metric.